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(N ' ABSTRACT 

T-H ■ 

f^ ' We describe a simple probabilistic method to cross-identify astrophysical sources from different catalogs and provide the probability 

C^ . that a source is associated with a source from another catalog or that it has no counterpart. When the positional uncertainty in one of 

; I ■ the catalog is unknown, this method may be used to derive its typical value and even to study its dependence on the size of objects. It 

C^ ' may also be applied when the true centers of a source and of its counterpart at another wavelength do not coincide. 

^rj ' We extend this method to the case when there are only one-to-one associations between the catalogs. 



I> 
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'^ 1. Introduction 

HH The problem of cross-identifying sources between two catalogs K and K' has previously been studied by 'Cond on et al.l (Il975h . 
r^ i de Ruiteret al. (1977), Prestage & Pe acock (1983 ), Sutherland & Saunder s (199 2) and Rutledge et al. (2000), among others. As 



Qh evidenced by recent papers of Budav ari & Szalavl (J2008) and Pineau et al] (1201 li) . this field is still very active and will be more so 
with the wealth of forthcoming multiwavelength data. Usually, the association is performed using a "likelihood ratio": this quantity 



I 

^ ' is typically computed as the ratio of the probability of finding, at some distance from a source M, e K, a source M' e K', if M' 
^ is a counterpart of M,, to the probability that M' is a chance association at the same position, given the local surface density of 
j^ /T'-sources. As noticed by Sutherland & Saund ers (1992), there has been some confusion in the definition and interpretation of the 
—^ likelihood ratio, and, more importantly, in the estimation of the probabilitjOl that a source in K' is the counterpart of a source in K. 
When associating sources from catalogs at different wavelengths, some authors include in this likelihood ratio some a priori 
information on the spectral energy distribution (sed) of the source. As this work began, our primary goal was to build template 
observational sed's of galaxies from t he optical to the far-infra red for different types of galaxies. We initially intended to cross- 
identify the IRAS Faint Source Survey (iMoshir et al.lll992[ Il993l) with the leda database (Paturelet al...l995.) . Because of the large 
ly-N ' positional inaccuracy of iras data, special care was needed to identify optical sources with infrared ones. While iras data are by 
■f—i ' now quite outdated and have been superseded by Spitzer observations, we still think that the procedure we developed at that time 
may be valuable for other studies. Because we aimed to fit synthetic sed's to the template observational ones, we could not and did 
not want to make assumptions on the sed of sources based on their type, since this would have biased the procedure. We therefore 



> 

0^ 



o 



p^ , rely in what follows only on the positions to associate s ources between catalogs . 

,__! ■ The method we use is essentially similar to that of [Sutherland & SaundersI (1 19921) . Because thinking in terms of probabilities 

• • ' rather than of likelihood ratios highlights some implicit assumptions, we found it however useful for the sake of clarity to detail 

. !^ hereafter our calculations; this allows us moreover to extend our work to a case not covered by papers cited above (see Sect.|4]i. 

k> We define our notations and explicit our general assumptions in Sect. |2l In Sect. [3] we compute the probability of association 

5_j , under the assumption that a A'-source has at most one counterpart in K' but that several TiT-sources may have the same counterpart 

d ■ ("several-to-one" associations). We moreover determine the fraction of sources with a counterpart and, if unknown, estimate the 

uncertainty on the position in one of the catalogs. In Sect. |4l we compute the probability of association under the assumption that 

a /T-source has at most one counterpart in K' and that no other /T-source has the same counteipart ("one-to-one" associations). We 

provide in Sect.|5]some guidance to help the user to implement these results. The probability distribution of the relative positions of 

associated sources is modeled in App.lAl 

2. Notations and general assumptions 

We consider two catalogs K and K' defined on a common area S of the sky and use the following notations: 

- #E: number of elements of any set E; 

- Ml, ... , M„, with n = #K: sources in K; 
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- M[, ..., M'j,, with n' = #K': sources in K' . 
We define the following events: 

- c,: Mj is in the infinitesimal surface element d-r,- located at r,-; 

- c' : M'j is in the surface element d^r'- located at r'-; 

- C = njLj c,: the coordinates of all /T-sources are known; 

- C = n'Li c'f- the coordinates of all ^'-sources are known; 

- Ajj, with j > 0: M'j is the counterpart of M,; 

- A,_o: Mi has no counterpart in K' , i.e. A, o = Uj>o^i. /' where HJ denotes the negation of any event w; 

- Aqj-. M'j has no counterpart in K. 

We also write / the a priori probability i^(U>() ^ij) that an element of K has a counterpart in K' (so, P(A,-,()) = 1 - /); we will 
see in Sects. [T2] and l4.2l how to estimate /. We moreover assume that any M, has at most one counterpart in K': Ajj n A, ^ = if 

Clustering is neglected in all the paper. 

3. Several-to-one associations 

In this section, we do not make any assumption on the number of /T-sources that may be the counterpart of a given source of K': 
this is a reasonable hypothesis if the angular resolution in K' (e.g. iras) is much poorer than in K (e.g. leda), since, in that case, 
several distinct objects of K may be confused in K' . As evidenced by Sect. 13.31 this is also the assumption implicitly made by most 
of the authors cited in the introduction. We call this the "several-to-one" case. 

3.1. Probability of association: all-sky computation 

We want to computq^ in the several-to-one case, the probability Ps.oiAij \ C n C) of association of sources M, and M'- (j > 0) or 
the probability that M, has no counterpart (j = 0), knowing the coordinates of all the objects in K and K'. Remembering that, for 
any events a>i, cdi and W3, P{oj\ \ coi) - P{cji n (jJ2)/P((^2) and P{aj] n fa;2 I W3) = P(a>i \ (02 n W3) P{a)2 I W3), we have 

We first compute P^AC \ C)- Using the symbol l+J for mutually exclusive events instead of IJ, we obtain 

n' n' n' n n' n' n' n 

p^.jc I c) = p.o(c n y y ■ ■ ■ y q a,,,, | c) = ;^ ^ ■ • ■ ;^ Ps:o(c n q a,,,, | c) 

ii=0./2=0 j„=Ok=l ji=Oji=0 ./„=0 k=l 

n' n' n' n n 

= z z ■ ■ • z ^-(^ I n^*.^-' ^ c')p.o(Q A,,,, I c). (2) 



h=Oh=0 j„=0 k=l k=l 



One has 



11 II II II II 

P,..o{C I f]AtJ, n C) = Ps:o(ci I n^* ^ (~]^k,M n C')Ps:o(nQ | f]Ak,j, H C) 

i^l k=2 k=l *=2 *=1 

= n ^^-(^^ I n ^* ^ n ^*- ■" ^ ^') ^^^ 



t=\ k=t+\ k=\ 



by iteration. 

If je + 0, since M( is associated with M'- only. 



where 



P,:\ct I Q Ci n Q At, ,, n C) = P,:o(Cf | Af, ,■, ^ <>) = ^t, k ^^^t, (4) 



^ov = 



27r(detrf,,v)i/2 



rf^^Y s r': - ri and the covariance matrix Ff y^ of rij^ is computed as detailed in App. |A] (Note that, in the several-to-one case 
considered here, the computation of Ps:o(C | C) is easier than that of Ps:o(C | C): because several Mc may be associated with the 



^ For the sake of clarity, let us mention that we adopt the same decreasing order of precedence of operators as in Mathematica ( I WolfranJl 1 996h : 
X and /; O; 2; + and -. 
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same M[, the latter would require to calculate Ps:o(c[ | n"=i- =k ['^f ^ ^f,jJ)- This does not matter in the one-to-one case studied in 
Sect.g]) 

If j( = 0, since M( is not associated with any source in K' and clustering is neglected, 

ft n' n 

Ps:o{ce I Q Q n Q 4 n Q At J,) = PUce I Ac,o) = ^f,o dVf , (5) 

k=C+l k=l k=[ 

where ^e,Q = 1 IS if we assume a uniform distribution of AT-sources without counterpart as prior. 
From Eqs. ©, © and Q, it follows that 

« n 

P..o[c\[^Ak.j,C^C')^AY[^k,k, (6) 

/(:=1 A-=l 

where A = HLi 'l^''*- 

We now compute Ps:o(nLi ^^.A I C')- Without any other assumption, Ps:o(nLi "^kJk I C") = ■Ps:o(nLi ^t, >*)■ Let m = #{jk > 0; 
A: e |[l,n]]). Since a given M^ maybe the counterpart of several Mk (i.e. the events {Akji.)keii,n-i are independent whatever the values 
of the indices jk), 

n n 

P.:o{[\AkJ,)^WP..AAk,k). 
k=\ A-=l 

As P.JAksi) = 1 - / and P.AAkj,) = //«' for ;t > 0, 

n If. \ m 

Hence, from Eqs. O, ©, © and ©, 

p.jc \c')^ay,Yj--Tj\^) ^1 - /)""" n ^^•^■' = -^ ^-' (^^ 

where 

n' n' n' n n n' 

71=0 i2=0 j„=0*=l k=\ k=G 

is the likelihood to observe the /^-sources at their positions if the positions of /T'-sources are known, (k o = (^ - f) ^k o and 
(k.A ^f^kjjn' if jk>0. 

The computation of Ps:o(A,-,; n C | C) is similar to that of Ps:o(C | C): 

n' n' n' n' n n' n' n' n' n 

P^-AAij n CI C) = P...[Cf^AiJ n y ■ ■ ■ y [+]•■■[+] Q A,,,, | C) = P,..,[c n [+]■■• [+] [+]■•■[+] Q^'t^A | C") 

ii=0 j,-i=Oj,>i=0 ;„=0A-=1 ;,=0 X-i=0>,>i=0 >„=0A-=1 

ki^i 

n' n' if n' n n 

= z ■ • ■ z z • ■ • z ^-(^ I n^*.^-' ^ c')p.o(Q A,,,, I c), (10) 

>l=0 j;-i=Oi,+i=0 j„=0 A=l <:=! 

where we have put jj = j. 

Let m* = #{jk >0;k e II, nj] (indices jk are those of Eq. ([TO]l). As for PyJC \ C), 



«' n' n' tf I J. \nf n n' n' n' n' n 

p..(A,,,ncic')=.x-2 z-z ^ (i-/)"-"'"n^^--^^'-z-z z-zn^^. 

>,=0 ,/;_i=0;i+,=0 j„=0^ ' i=l 71=0 i;-i=0;i+,=0 ;„=0 A=l 



= if,-,,f[|]^,,,,. (11) 

*=i A=0 
A#/ 

Finally, from Eqs. Q, ©, (01 and O, 



^i, ; n A= 1 2j ^t =0 ^'t, ji ^ 

Ps:o(A;, y I C n C) = ^^^ = "' (12) 

'■' ' i-r« Y"' r V" / 

1 li:=l Zj;t=0 hk. it 2uk=0 hi, k 



■^^''' ifj>0. 



(l-f)n'/S+fZ'U€i.k 
il-f)n'/S 

[(l-f)n'/S+fZti€i,k 



(13) 



if; = 0. 
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The probability Ps:o(^o,; I C n C) that M' has no counteipart in K can be computed in this way: 

n' «' «' n n' n' n' n 

PsAAoj n C| C) = Ps:o(c n Ao,; n y y ■ • ■ y Q A,, y, I C) = P,..,{c n\+j\+j ■■■{+}('] Atj,\ C) 

ji=0J2=0 j„=Ok=l ./i=0>2=0 j„=Ok=[ 



;i=0;2=0 jn=0 k=\ ji^0J2=0 j„=Ok=l k=l jt=0 

ii+j h*i j.,*i ji*i h*j J«*j jk*i 



and 



P (A nr\r'\ ■^^k=\T.j,=o(k,k n yn' r , _ . n , ^ 

Ps:o(Ao.,|CnC)- p,^,^. -^_ _, ^,.-11 v«' ;-, . -11^ 



« 

3.2. Fraction of sources with a counterpart and otiier unknown parameters 

3.2.1. Estimates 

Besides /, the probabihties P{Aij \ C n C) may depend on other unknown parameters, e.g. & and v (cf. App.lAl). Let us write them 
xi, X2, etc., and jc = {xi,X2, . . .). An estimate Jc of at maybe obtained by maximizing the likelihood L with respect to or (and with the 
constraint f^-o e [0, 1]), or, equivalently, by finding the solution x of 

^=0. ,15) 

OX 

For any parameter Xp, as all the (ij are stiictly positive and lnLs:o - Zi"=i ^^Tj"=oii,k (Eq. Q), 



Let us consider in particular the case Xp - f. Note that dln(jo/df - -1/(1 - /) and dln^jj/df = 1// for j > 0. Since 

i:"lo^s:o(A,-;|CnC') = l, 

^ 51n^,- ,■ ^ ^ ^ , ^ ^ ^,^ _ P,„(A,- I C n C) , ^ i^s:o(A,-,,' | C n C) _ Ps:o(A,-,o I C n C) ^ 1 - Ps:o(A,-,o I C n C) 



— i^p.o(A,, I cnC) = - ^"^ '; _ , - + J]' 



— dx„ - "^' ' i-f j^ f i_/ / 

^ (l-/)-fi:o(A,.o|CnCO 

/(I-/) 

Summing on /, we obtain 

glnLs:o _ « (1 - /) - Zti fs:o(A,-.o | C H CQ 
5/ /(I-/) 

So, as expected, an estimate of the probability that a source in K has a counterpart in ^' is given by 



'■=1 j=i 
Note that, since d^^ij/df = for all (ij) e [[l,nl x IO,n'l, 



1 " 

/s:o=l--y4o(A,o|CnC'), 

n ^—^ 



(17) 
(18) 

(19) 
(20) 



^^>"^--.2(S^^f<0 (21) 



dp H\ HU^U 
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for all /, so d In Lyo/df has at most one zero in [0, 1]: /s:o is unique. 

One may also compute an estimate of the fraction /' of /T'-sources with a counterpart from 

One can easily check from Eqs. ( l20l i. (l22l i and (fT4l i that /s:o/n' > ./s'o/" ii^ '^he several-to-one case. 

3.2.2. Uncertainties 

It may be interesting to know the uncertainties on the unknown parameters. For large numbers of sources, the covariance matrix V 
of X is asymptotically given by 

(iKendall & StuartifTOTl . 

Let us write with a circumflex accent all the quantities calculated at a: = Jc. From 

d^lnL 1 d^P(C\C') 1 dP{C \ C) dP(C \ C) 



one obtains 



dxp dxq P(C I C) dxp dxq P^(C | C) dxp dxq 



d'^lnL _ 1 d'^P{C\ C) 

dXp dXq P(C I C) dXp dXq 



One has 



JL _?^ /l2 



For any product of strictly positive functions g^ of some variable y. 

dVil^igk V %/ FT V^lng, 



so, using Eq. ( fTTT i. 

5^^ " -^ 5^ 1 Zj ^^■^' + -^^'.^ Zj -3-^ 1 1 h ^*->' 

°^'i "^1 k=\ A=o f=i °^^« i=i A-=o 

ki^i C+i kiUJ]' 



n n' r^A y n n' 






For X = Jc, 






2 Z ^^^ ^^-(^^'^^ ^ C I C ) = Z Z -^^ P-°^^''J' n C I C) - X ^^ Ps:o(A,-,,-, n C I C) 



(24) 



dXpdXq AjAj dXpdXq '^° '-' Zj Zj ^Xp 5x, 



■' (=1 -^ i:=l /=1 ■' k=l 

ki=i 



■ Ps-AAij nC\C') + P,.AAi,j I C n C) J] J] ''' P.AAej, n C \ C). ill) 



^^^Ps:o(A,-,;,nC|C') (28) 

;,=0 "■'^1 
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since the first term on the right-hand side of the first Une is zero from Eq. (fTSI l. Finally, combining Eqs. (l25t . (l27t . (l28T l and dividing 

by P,.,oiC I C), we obtain 

dxn dx„ r-f 4t^ \ dxn dxa dx„ dx, 






2] ^-^^ P-o[Aij I C n C] 2] ^-^ ^-o(^'.i I C n C). (29) 



i=l ^>o "■"/' ' >=0 

In particular, for Xp - Xq- f, d^ In ^/_ j/5/^ + (d In f,, j/df)^ = 0, whether y = or not. From Eqs. STD and ([19), 

gMnL,:o ^ y/ 1 A:o[A;,o|Cna ^' 

« i:".i fs-o(A,.Q I C n CO 

f2 f2 



/s:o /s:o v^ Js:o) 



(30) 



3.3. Probability of association: local computation 



In the several-to-one case, a purely local computation of the probability of association between a given M,- and some M' ( j > 0), or 
of the probability that M, has no counterpart in K', is also possible. 

Let us consider a region D, of area Si containing the position of M,, and such that we can safely hypothesize that the K'- 
counterpart of M,, if any, will be inside. We assume that the local surface density pj of A'' -sources unrelated to M, is uniform on D,. 
To avoid biasing the estimate if M, has a counterpart, pj may be computed from the number of /T'-sources in a region surrounding, 
but not overlapping, D,. 

Besides the A,; j, we consider the following events; 

- Nj: Di contains nj sources; 

- C; = Hjei, c'j, where /,■ = {j \ M) € D,}. 

We want to compute the probability that a source M' in D, is the counterpart of M,, given the positions of the neighbors, i.e. 
floc(A,-,j|C;nA?;).Wehave 

P\oMi i n c; n N'i) floc(c; n A; ,■ n Nj) 



PiUQ n n;) Pi,,(q n lij^ei^o) ^i, k^N;) 
floc(c; n Aij n Nl) PiodC'i I Aij n n;) Pioc(A,-,; n ND 



Zte/.uio) floc(C,' n Aa n A?/) Ste/.uioi floc(Q I A^ n A?/) Pioc(A;,i n iv/) 
fioc(c;iA,-jnJv;)Pioc(A,-.j|Jv;) 
" Ijkemo}Pioc(Q I Aa n A?/)Pioe(Aa I A?/)' 

If j > 0, -Pioc(A,; j I N-) = flocdJte/, A;,i: I N'i)/n'i (one sees here why the event Nj was defined; otherwise, PiodAjj) could not be 
computed as -PiocdJie/, Ai^kiln'i because n[ would be undefined). Now, 



floc(UA,-i|A?;) 



floc(A^; n Ute/, Aa) floc(A^,' I Ute/, Aa) floc(Ute/, A,-, iO 



,,,_ floc(A^/) P\om I A,,o) floc(A;,o) + Pxo.{Nl \ Ute/, A;,t) PlocdJte/, A,,*) 

If clustering is negligible, the number of sources randomly distributed with a mean surface density p\ in an area S; follows a 
Poissonian distribution, so 

P\oc\Ni I {jAi.k) = ^^, _ -^^, («; - 1 random sources m S,) 

k€li 

and 



(«; - 1)! 



D .«' I 4 ^ (p'i SiT' exp(-p'i Si) ., , . „, 

PiociNi I A; o) = j («; random sources m 5,). 



Thus, 

( f 



PioMij I A^;) 



n'if + il-f)p'iSi 
(I -f) p'i Si 
{n\f + {\-f)p',Si 



if; = o. 
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For j > 0, 

Si 






(rigorously, ^ij should be replaced by ^ij/PiodM'j E D/ \ A/j), but PiodM': i D, \ A,;y) is negligible), and 



.e, ^' 



Finally, 



floc(A,-,;|c;nA?,0 



(1-/) 



if;>0, 



(31) 



where lr, ^ = ^i^klp'i is the "likelihood ratio". Mutatis mutandis, one obtains the same result as Eq. (14) of lPineau et alJ (1201 ih 
and aforementioned authors. When extended to the all sky (i.e. 5,- -^ S), p'j is replaced by n'/S in Eq. (ISTT i. Yjkeii by 'Ll^i and one 
recovers Eq. (fTTt . 

The index j, of the most likely counterpart M'- of M,- is the value of j > maximizing lRjj. Usually, S^^ii^t- LR/.* ^ LR;. /,, so 

Ps:o(A,j, I C n C) - •'' 



(l-/)+/LR,-,^, 



As a "poor man's" recipe, if the value of / is unknown and not too close to either or 1 , an association may be considered as tiTie if 
LR, J. » 1 and as false if lr, j. «: 1 . Where to set the boundary between true associations and false ones is somewhat arbitrary. For 
a large sample, however, / can be determined from the distribution of the positions of all the sources, as shown in Sect. 13.21 

4. One-to-one associations 

In Sect. [3] a given M' may be associated with several M,-: the probabilities are actually asymmetric in M, and M' and, while 
Ij%q Ps:o(Aij I C n C) = 1 for all M;, one may well have XU PsJAi, j \CnC')>l for some sources M'j. 

Here, we assume not only that each A'-source is associated with at most one /T'-source, but that each .^'-source is associated 
with at most one /iT-source. We call this the "one-to-one" ca se and note Pq-o the probabilities calculated under this assumption. As 
far as we know and despite some attempt bv lRutledgeet a l.l (12000), this problem has not been solved previously. 

Since a /T' -potential counterpart of M,- within some neighborhood D, of M,- might in fact be the true counterpart of another 
source Mi; outside of D,, there is no obvious way to extend the exact local several-to-one computation of Sect. l3.3l to the one-to-one 
case. We therefore have to consider either the whole sky, as in Sect. 13.11 or at least some large enough region around both M, and 
M'j to neglect side effects. 

In the case of one-to-one associations, a source of K and a source of K' play symmetrical roles; in particular, Po.oiAi^ j) - f/n' = 
f'/n. However, for practical reasons (cf. Eq. (l3Ft ). we name K the catalog with the fewer objects and K' the other one, so n < «' in 
the following. 

4. 1 . Probability of association 

We want to compute PoAAij \C nC) for / > 0. We still have 

, PooiAi ;■ n C| C) 
Po:o(A,.|CnO^ pJciCO '''' 

and 

n' n' n' n 

PUC I C) = Po:o(c n U U ■ ■ ■ U f]A,j, I C). 

ji=0J2=0 j„=Ok=l 

As Aij n Akc = if i i^ k and j - ( > 0, this reduces to 



n n n ft 



puc I c) = Po:o(c n y y ■ • ■ y q a^,,, | c). 



,/i=0 J2=Q ,/„=0 *■=! 

jttJojliJl j„tJ,:-t 
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where Jq = and Ji^ is defined iteratively for all k e [1, nj by 7^; = {Jk-i U {jit}) \ (0). Hence, 



PoAc I c) = 2 2 • • • 2 ^°-(^ ^ n^^.A I ^') 



71=0 ;,=0 ;„=0 k=l 

jiiJo jiiJi j„iJ„-i 



II II II II It 

" Z Z ■ ■ ■ Z M^ I n^^-A ^ C')Po:o(Q A,,,, I C). (33) 



Ji=0 ;2=0 ;„=0 A=l k=\ 

jiiJo iiiJi j„iJi,-\ 



As in the several-to-one case, 

n n 

^o:o(c|QA,.,,nC') = ^f[^,.,,. (34) 



k=\ k=l 



We now have to compute Po:oir\"k=\ ^kjt I C) = Po:o{r\"k=\ ^kjt)- Let m = #J„ and X be a random variable describing the number 
of associations between K and K': 

n n n 

Po:o{{^Auj) = Po:o(n At,,, | X = m) Po;o(X = m) + Po:o(n At,,, | X ^ m) Po:o(X ^ ot). 

*=1 k=\ k=\ 

Since Po:o(nLi ^*.a I ^ '^ "^) ~ 0' one just has to compute Po:o(n"=i ^*. a I ^ ~ "^) ^"'^ ^o:o(-'^ = 'w). 

There are n!/(m! [n - m\\) choices of m elements among n in K, and n'l/(ml [«' - m]!) of m elements among n' in K'. The 
number of permutations of m elements is m ! , so the total number of one-to-one associations of m elements from K to m elements of 
K'is 

n\ n'\ 

m\ . 

m\ (n — m)[ ml (n' — m)! 

The inverse of this number is 



nl \ ml (n — mV. (n' — mV. 
A^.J^X-'-)- (35) 



With our definition of K and K', n < n', so all the elements of K may have a counterpart in K' jointly. Therefore, Po:o(-'^ — '«) is 
given by the binomial law: 



n' 

/■m /I r\n-m 



PUX = m) = — r (1 - /)"-"' . (36) 

ml (n — my. 



From Eqs. ^, dSll, ^ and ^, we obtain 

tl' n' 



;i=0 72=0 ;„=0 • k=\ 

jxiJiihih j„iJ„-\ 

= ALo:o, (37) 



where 



n n n n 



^o:°-ZZ--- Z 0"*.^" (38) 



ii=0 i2=0 i„=0 *=1 



m,a = ik,o and 77i,,, = f^kjjin' - #7t-i) if y't > 0. 

Po:o(Aij n C I C) is computed in the same way as Po.oiC \ C): 



n' n' n' n 



Po-JAi, ,• n C| C) = Po:o(c n A,-, ,■ n y ■ ■ ■ {+j [+j ■■■ {+j f] A^, ,, | C) 



ji=0 j,_,=0 >„,=() ;„=() k=\ 

if n' n' n' n 

/^o:o(cny ■•■ y y •■■ y n^*.>'K')' 
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where ;,■ = ;, 7* = {j} \ {0} and j; = (7;_, U {j,}) \ {0} for all k e |[l,n]], so 



n' n' 



Let m* = #J*„. As for Po:o(C | C), 



jiiJ'o ii-\iJti iMtJ', JittK-\ 



n n 



(n'-m'y. ^„,. _ 

;,=0 ji_i=0 JM=0 j„=0 " • /;=! 



«A„ncir) = .^^^^ 2] 2 ^^^ ^ l!iz^r-(.-/)-nf. 



M n n n 



-^iijj]--- z z •■■ z n^U' (39) 

where 77* ^.^ = f^kjj(n' - #Jl_^) ifki^i and jk > 0, and 77* ^.^ = ^4,^,, otherwise. 
Finally, from Eqs. ^, (EUl, dig and ^, 

^i, ; Zj;, =0 • ■ • 2j j,_, =0 Zj j„, =0 • • ■ /j;„=0 1 h=l '?*, jt 

Pe.„(A, , I C n C) = "^^° , ^-r^- ^"■^' ^^^^^^^^ . (40) 

Zjj,=o ^j2=0 ■ ' ■ ^i„=0 1 U=l 'Ikjk 

jitJo jitJi j„iJ„-i 

The probability that a source M' has no counterpart in K is simply given by 

« 

PoAAoj I C n C) = 1 - ^ Po:o(At,,- 1 C n C). 

4.2. Fraction of sources with a counterpart and otiier unknown parameters 

4.2.1. Estimates 

As in the several-to-one case, an estimate JCo:o of the set x of unknown parameters may be obtained by solving Eq. (fTST i (with the 
constraint /o:o e [0, n/n']). As the number of terms in Lo:o grows exponentially with n andn', Eq. (|38] | seems useless for this purpose. 
Fortunately, the computation of Lo:o is not necessary if the probabilities Po:o(A/, / I C n C) are known (we will see in Sect. l5.2l how 
to approximate these). 

Indeed, for any parameter Xp, let us show that we get the same result (Eq. (fTSI l) as in the several-to-one case. Using Eq. (|26] |, we 
obtain 

dPo-.oJC I C) f. f. V v^l!i^rT .An 

'^ '^ i,=0 ./2=0 ;„=() .=1 '"'P k=l 

JxihJiiJi j„iJ„-\ 

The expression of Po-.oiAij ^ C \ C) may also be written 

n' n' n' n 

PUA,j n C I C) = ^ Z Z ■ ■ ■ Z ^^J' = J^ n '?^-^- 

71=0 72=0 ;„=0 k=\ 

JiiJohiJi i„iJ„-\ 
where 1 is the indicator function (i.e. l(j, - j) - I if proposition "j, = j" is true and l(j, = j) = otherwise), so 

;=i j=o ^■^z' ,=1 7,=o 72=0 7„=o 7=0 """'^ i=i 

jiiJoj2iJi in<tJ„-\ 



n n n 



^ZZZ-Z^fl'M. (42) 



1=1 71=0 72=0 7„=0 '' ^=1 

ixihhih i„<tJ„-\ 

If J, = 0, rjij. - (ij.; and if j, > 0, the numerators of 77, j, and ^, j, are the same and their denominators do not depend on Xp-. in all 
cases, dlnriij./dxp = dln^j./dxp. The right-hand sides of Eqs. ( HTt and (l42l l are therefore identical. Dividing their left-hand sides 
by Po-.oiC I C), one obtains again 



■sr-t sr-t U III Li i 



^-^z- tfjbf ^^/^ 



(44) 
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FoT Xp - f, one still has din (io/df - -1/(1 -/) and din (ij/df = 1// if y > 0, so, as in the several-to-one case, 

dlnL,.,, ^ «(1 -/) - Y/U PoAAjA) I C n CO 
df ~ f(l-f) 

4.2.2. Uncertainties 

Regarding uncertainties on the Xp, Eqs. ( l23T l. ( l24l i and ( IZST i are valid in the one-to-one case too, so, from Eq. ( l43T l. 

glnL^ ^ ^ ^ glng, ^ ^ ^^ ^ ^ ^ gh^ aP„.(A,. , | CnC')_ 

Contrary to the several-to-one case, no simple exact analytic expression of the terms dPo.oiAij \ C n C')/dxq could be obtained. 
These derivatives may be computed numerically using finite differences; however, unless the fraction of sources having several 
likely counterparts is high, Eqs. (|29^ and (l30t should provide a more convenient approximation of the covariance matrix of ^o:o- 

5. Practical implementation 

5.1. Several-to-one case 

5.1.1. Neiglibors only! 

In the several-to-one case, the computation of the probability of association Ps.oiAjj | C n C) between M, and M' from Eq. (fTZt is 
without problem if / and the positional uncertainties are known. However, the number of calculations for the whole sample or for 
the determination of x is of the order of n n'^. 

As (i^k rapidly tends to when the angular distance r/i between M, and M[ increases, there is no need to sum from k - \to n' 
in Eq. (fT2l i. nor to compute explicitly all the P^-.oiAi^ j \ C nC').lf Ris, some angular distance above which ^,; k ^ n'lS , one may set 
^i^k to (and P^AAuk) too) if r,;^ > R and replace the sums Yll=\ ^Y TX^i-^ ^^r- 

In fact, for most M,, one does not even need to test whether rij < R for each M'^. e K'. Let us write E, the domain of right 
ascensions a' out of which no point M' of declination 6' closer than R to M, may be found. The angular distance ij/ between M' and 
Mi is given (cf. Eq. ( lA.lb ) by 

cosi^ = cos(q'' - a,) COS 5; COS (5' -I- sin (5,- sin 5' . 

If di 6 [-7r/2 + R, njl - R], the minimum of cos(a'' - ff,) under the constraint cos ij/ > cos R is reached when sin 5' - sin (5,7cos R 

and 

-v/cos^ R - sin^ 6 1 
cos(a - a,) = . 

cos Oi 

Let A,- = arccos(-\/cos2 R - sin- 6i/cos dij. The domain Ei is given by 



Ei 



' [0, a,- -H A; - 2 ;r] U [a; - A,-, 2 n\ if «,■ -H A/ > 2 ;r, 
[0, ai -H A,] U [a; - A; -1- 2 TT, 2 tt] if a; - A; < 0, 
[a,- - A;,!!, + A,] otherwise. 



If 5; 6 [-7r/2,-7r/2 + /?]U[;r/2-7;,7r/2],onehas£', = [0,27r]. 

For a catalog K' ordered by increasing right ascension (if not, this is the first thing to do), one may easily find the subset of 
indices k for which a'k e E-,. For instance, if £, - [a, - A,, ff, + A,], one just has to find by dichotomy the indices kr and k^ such that 
a'k-_\ < Oi - A; < a'/.- and a^t < a,- + A; < a^+^i . The sums 'Ek=i;r .<« ™^y '^^en be replaced by EL*-; r <«■ 

In all cases, the sum may be further restricted to sources with a declination S'k e [5, - R, Si + R^f] {-nil, n/2]. 

5.1 .2. Fraction of sources witli a counterpart 

All the probabilities depend on / and, possibly, other unknown parameters like cr and v. These parameters may be found by solving 
Eq. ([T5]l using Eq. (fT6l l. 

If the fraction of sources with a counterpart is the only unknown, the ^,-, j need to be computed only once and / may be easily 
determined from Eq. (fT9] l. Denote g the function 



1 " 
f^l--yp,.JAi,o\CnC'). 
n ^ 



!=1 

Let us show that, for any /o e ]0, 1 [, the sequence {fk)kes defined by fk+\ = gift) tends to /. 
10 
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First, note that 

n df 

The only fixed points of g are hence 0, 1 and /. As d^ In Ls-o/df^ < (Eq. jTi}). one has dlnLs-o/df > and thus g(f) > f for 
/ e [0,/]; similarly, din L,.,Jdf < and gif) < / for/ € [/, 1]. Because 

^ is also an increasing function. 

Let us consider the case /o e [0, /]. If ft < /, g(fk) > fk and g{fk) < g(f) = /. As g{fk) = fk+u (/t)teN is an increasing sequence 
bounded from above by /: it converges therefore in [/o,/]. Because g is continuous and / is the only fixed point in this interval, 
(/i)teN tends to /. ^ 

Similarly, if /o e [/,!], (fk)keN is a decreasing sequence converging to /. 

5.2. One-to-one case 

All what was said for the several- to-one case still holds in the one-to-one case. Incidentally, as the former is computationally much 
simpler than the latter, it is a good idea to compute first JCsio and the probabilities Py.oiAjj \ C n C): as fs.o/n' > /'■□/« and 
/o:o/n' - /o'o/"' '^he several-to-one assumption is probably correct if /s:o/n' » f^ol^'^ and if not, one may first test the one-to-several 
(subscript "o:s" hereafter) assumption, i.e. reverse the roles of /T and K' in all the formulae of Sect. [3] and adopt it if /o:s/«' «: /o s/«- 
Ideally, one would compare the likelihood of each assumption and adopt the most likely one. While Ls:o and Lo:s are easily 
computed, no convenient expression was found for Lqq. However, if In L^-o and In Los are of the same order, this provides some hint 
that the one-to-one case (or maybe the several-to-several one!) should be considered. Even then, ls:o will still be a good starting 
point to find Xq-o and there will be no need to compute PooiAi j\CC] C) for all couples (/, j) such that PyoiAi j\CC\ C) ~ Po s(^/ j I 

CnC')~l. 

The results of Sect. l4.2l are given in terms of Po.oiAij \ C n C). The only difficulty is to estimate this probability from Eq. (l4Qt . 
Because of the combinatorial explosion of the number of terms, an exact computation is hopeless. An approximate value might 
however be obtained in the following way. 

For any M,, let be a permutation on K ordering the elements M^([), M^(2), . . . , M^(„) by increasing angular distance to M,-. For 
j -0 or M'j in the neighborhood of M,, and for any £ e [ 1 , n] , define 

P,(A,j I C n C) - --^ '1'" . . . (45) 



2jy,=0 2j;,=0 ■ ■ ■ 2;V=0 lik=l ^k. 



Jk 



Where 7f'* = {;) \ {0}, 7f * = ijf'_\ U {;,)) \ {0} for all k € p,nl, 7* = Jk for all k, 

^k h = T- ^"d r]l = — if Jk > 0, 

and 77^ „ = /7^; * = ^4,(k),o- 

As 0(1) - i, Pi(Ai^ / I C n C) - Ps:o{Aij I C n C) (cf. Eq. (fTSli): at first order, we obtain the same result as in the several-to-one 
case. Since the influence of other /T-sources on the result decreases very fast with their angular distance to M, and M' if M, and M' 
are close to each other, PdAij \ C n C) should rapidly converge to P„{Aij \ C n C) - Po-.oiAij \ C n C), even for small values of £. 

Because of the recursive sums in Eq. (|45] |. the computation must in practice be further restricted to sources M^ in the neighbor- 
hood of Mi and M', as explained in Sect. lS.l.Tl 

Appendix A: Covariance matrix 

Let us first remind a few standard results. The probability that a (7-dimensional normally distributed random vector W of mean // 
falls in some domain Q is 

where B = {ui, . . .,Ug) is a basis, Wb = (wi, . . . , w^Y is the column vector in B of h" = Yj1=i wjUi, df^Ws = dwi x ■ ■ ■ x dwg and Fg is 
the covariance matrix of W in B. We note this Wb ~ GgifXB, F^). 

In another basis B' = (mj, . . . ,m'), one has Wb = Tb^b' ■ Wbs where Tb^b' is the transformation matrix from B to B' (i.e. 
"} = I,%i(TB^B')i,jUi)- Since d%B = \detTB^B'\d''wB' and 

(w - fiYs ■ Fg' ■ (w - fi)B ^{w- fiYg, ■ [Tj^Ij^, ■ Fb ■ [TbIb'T] • (^ - f^)B', 

11 
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one still obtains 

r exp(-i [w - fiYg, ■ T-g} ■ [w - fih') 

where Fb' = Tj^^g, ■ Fb ■ (^^1^^,)^ is the covariance matiix of W in B'. In the following, B and B' are oithonormal bases, so Tb^b' is 
a rotation matiix. From T^^^, = Tg^^,, one gets Tg' = T^^g, ■ Fb ■ Tb->b'- 

In a common basis, for independent random vectors Wi ~ Gqifii, T\) and W2 ~ Gqip.2, ^2), we have 

Wi±W2~Gq{ni±H2,T,+T2). 

We now use these results to obtain the covariance matiix of vector rtj = r'- - r,-, where r,- and r'- are, respectively, the observed 
positions of source M, of K and of its counterpart M' in K' . We note r? and rf their true positions. One has 

n,y = (r;-r;) + (r;-r») + (r«-n). 

We drop the subscript and the "prime" symbol in the following whenever an expression depends on either M, or M' only. 

Let (Mj, Uy, Mj) be a direct oithonormal basis, with u^ oriented from the Earth's center O to the North Celestial Pole and u^ from 
O to the Vernal Point. At a point M of right ascension a and declination 6, a direct oithonormal basis {u^, Ua, ug) is defined by 



OM 

\\OM\\ 

dur/da 
\\dur/da\\ 

dUr/dS 
\\dur/d6\\ 



Ur = = COS 6 COS aUjr + COS 5 sin a Mv + sin (J M^, 

IIOMII ^ 

a 

- -sinaHj: + cosaH^, 



H^ = -^ — 1-^^^ — - sin 5 cos aUx - sin 6 sin aUy + cos 6 u^ 



The uncertainty ellipse on the position of M is characterized by the lengths a and b of the semi-major and semi-minor axes, and 
by the position angle (3 between the North and the semi-major axis. Let Ua and ut be unit vectors directed respectively along the 
major and the minor axes, and such that (Mr, Ha, "a) is a direct oithonormal basis and fi = (ug^a) is in [0, n] when counted eastward. 
In the plane oriented by +Ur, 

_ ( sinyS coSjS\ _ 
A«»,«.)-(«..«.) - [- cos/3 sm/3J = ^°^<^^' 

since («„, ug) is obtained from («„, hj) by a (/? - ;r/2)-counterclockwise rotation. A^ 

r(«„,«.) = (o ^2) ^ Diag(fl2>2), 

one has r(„„,„,) = Rot' (J3) ■ Dmg(a\ h^) ■ RotOS). 

As noticed by Pineau et al. (2011), for sources close to the Poles, (Mq.,., ug.) ^ {Ua-., ug>.), so one needs to define a common basis. 
We use the same basis as them, noted (t, n) below. While the results we get are intrinsically the same, some people may find our 
expressions more convenient. 

Denote i/r = (u^^r') £ [0, tt] the angular distance between M, and M', and n = Ur, x Hr'/llwr, x "r'H a unit vector perpendicular 
to the plane (O, M,, M'). One has Mr, ■ "r' - cos tf/, so 

1/' - arccos(cos 6, cos d'j cos [a'- - a,] -1- sin 6i sin d'j), (A. 1) 

and \\Ur- X Ur'.\\ = sinif/. 

Let y, = (n^g.) and y' = {iv/ugr) be angles oriented clockwise around +Hr, and +Hr', respectively. Angle 7, is fully determined 
by following expressions: 

cosy, - n-Ug - (Mr X "rO • Ug - {Ug. X Ur) ■ Ur' = Mq, ■ Mr' 

sini/r ' ^ ' sint/r ' ' ^ sini/r ' ^ 
cos 5'' sin(Q''. - Q-,) 
sini/r 

Siny; = -« ■ M„, = ^ (Mr,. X Mr'.) ■ M„. = ; (m„. X Mr,.) ■ Mr'. = Ug. ■ Ur' 

sini/r ^ sini/r ^ sinf/r ^ 

cos (5; sin <5'- - sin <5; cos (5'. cos(q''. - a,) 
sini/r 



' We seize this opportunity to correct equations (A.8) to (A. 1 1) of iPineau et alj i2011h : a and b should be replaced by their squares in these 
formulae. 
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Similarly, 

cos 6i sin(a'- - a,) cos (5,- sin 6'j cos(a'j - a,) - sin 6i cos 5' 

cosy' and siny'- = . 

■' sini/f ^ smifr 

Note that determining 7, and y' themselves might slow down the computations: for instance, only the sines and cosines of/?, and y, 
are of interest in the matrices Rot(y6, + y,) used hereafter, as is obvious from the expansion of sin(j6, + y,) and cos(j6, + y,). The same 
holds for Rot(j6' + y') and other matrices. 

Let t = n X Ur,: t is a unit vector tangent in M, to the minor arc of great circle going from M, to M'-. Project the sphere on 
the plane (M,, t, n) tangent to the sphere in M, (which specific projection does not matter since we consider only /T'-sources in the 
neighborhood of M,): one has rtj ~ if/ 1, and the basis (t, n) is obtained from (Hq, hj) by a (/6 + y - 7r/2)-counterclockwise rotation 
around +Ur, so, in {t, n), 

Ti = Rot' (J3i + yd ■ Diag(fl2, b^) . Rot^S,. + y,) and T) = Rot'ip) + y)) ■ Dmg{af, b'f) ■ Rol(l3] + yp. 

As r,- ~ G2(0, r,) andr'- ~ G2(0, F'), one has rij ~ G2(0, F, j) if the true positions are identical, where F, ^ = F, + F'. 

If the positional uncertainty on M, is unknown, one may assume that F, - <j^ Diag(l, 1), with the same cr for all A'-sources, and 
derive & = crhy maximizing the likelihood to observe the distribution of A'-sources given that of /T'-sources (see Sects. I372l and l4.2l i. 
For a galaxy, however, the positional uncertainty on its center is likely to increase with its size. If the position angle 0,- (counted 
eastward from the North) and the major and minor diameters D, and di of the best-fitti ng ellipse of some isophote are known for 
M, (for instance, p arameters pa, D25 and d2s = D25/R25 taken from the rc3 catalog (Ide Vaucouleurs et al.lll99lh or HyperLeda 
(iPaturel et al.ll2003h ). one may model F,- as 

F; = Rot'(y/ + Oi) ■ Diag(o-2 + [vDif, cr^ + [vdif) ■ Rot(y,- + 6»,) = cr^ Diag(l, 1) + v^ Rot'(y,- + fl,) ■ Dmg{DJ, df) ■ Rot(y; + 0,), 

and derive both & = cr and v = v from the maximum likelihood. Such a technique might indeed be used to estimate the accuracy of 
coordinates in some catalog (see Paturel & Petit ( 1999) for another method). 
If the positional uncertainty on M' is also unknown, one can put 

F} = o-'^ Diag(l, 1) + v'2 Rot"(y^. + 0'j) ■ Diag(D^ d^) ■ Rot{y'j + 0'j) 

with the same cr' and v' for all /T'-sources. As y'. + 0'. - y-, + 0;, only & = {cp- + cr'^) and v = (v^ + v'^) may be obtainecQ from 
the maximum likelihood, not cr,cr' ,v or v'. 

A similar technique can be applied if the true centers of a source in K and of its counterpart in K' may differ This might 
be in particular useful when associating galaxies from an optical catalog and from a ultraviolet or far-infrared catalog, because, 
while the optical is dominated by smoothly-distrib uted evolved stellar pop ulations, the ultraviolet and the far-infrared mainly trace 
star-forming regions. Observations of galaxies bv iKuchinski et aLJ (l2000l) have indeed shown that galaxies are very patchy in the 
ultraviolet, and the same has been observed in the far-infrared. As the angular distance between the true centers should increase 
with the size of the galaxy, one may model this as r'" - r" ~ G2(0, Fq), where Fq = Vg Rot^(y, -1- 0,) ■ Diag(Dj, d\) ■ Rot(y, -1- 0,). 

In the most general case, 

r,-y~G2(0,F,,,), 

with Yij = F; + F' + F(). Once again, if cr, cr', v, v' and vq are unknown, the quantities & = (cr^ + cr'^) and v = (v^ + v'^ + v^) 
may be determined as indicated in Sects. [T2] and l4. 21 
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* However, as noticed bv lde Vaucouleurs & Head ( Il978h in a different context, if three samples with unknown uncertainties cr, (; £ Jl, 3]|) are 
availabl e and if the cr,., = (g-J + cr^j)^!^ may be estimated for all the pairs {i,j)ji,i € |[1,3]|^, as in our case, then o", may be determined for each 
sample. IPaturel & Petit! ( 11999^ used this technique to compute the accuracy of galaxy coordinates. 
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